Skip to main content

Worldwide Web Crawls

Wide crawls of the Internet conducted by Internet Archive. Please visit the Wayback Machine to explore archived web sites.



rss RSS

634,931
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Wide Crawl Number 17: Started August 3rd, 2018
Wide Crawl Number 17: Started August 3rd, 2018
collection
69,709
ITEMS
2.2B
VIEWS
collection

eye 2.2B

Wide17 was seeded with the "Total Domains" list of 256,796,456 URLs provided by  Domains Index   on June 26th, and crawled with max-hops set to "3" and de-duplication set "on".   
collection

eye 2.6B

The seed for Wide00014 was: - Slash pages from every domain on the web: -- a list of domains using Survey crawl seeds -- a list of domains using Wide00012 web graph -- a list of domains using Wide00013 web graph - Top ranked pages (up to a max of 100) from every linked-to domain using the Wide00012 inter-domain navigational link graph -- a ranking of all URLs that have more than one incoming inter-domain link (rank was determined by number of incoming links using Wide00012 inter domain links)...
collection

eye 1.6B

Web wide crawl.
collection

eye 1.4B

Web wide crawl number 16 The seed list for Wide00016 was made from the join of the top 1 million domains from CISCO and the top 1 million domains from Alexa.
Wide Crawl Number 12 - started March, 14th 2015
Wide Crawl Number 12 - started March, 14th 2015
collection
49,621
ITEMS
1.4B
VIEWS
collection

eye 1.4B

Web wide crawl with initial seedlist and crawler configuration from January 2015.
Wide Crawl started June 2014
Wide Crawl started June 2014
collection
45,341
ITEMS
1.3B
VIEWS
collection

eye 1.3B

Web wide crawl with initial seedlist and crawler configuration from June 2014.
Wide Crawl started April 2013
Wide Crawl started April 2013
collection
25,035
ITEMS
1.5B
VIEWS
collection

eye 1.5B

Web wide crawl with initial seedlist and crawler configuration from April 2013.
Wide Crawl Number 13
Wide Crawl Number 13
collection
46,050
ITEMS
1.1B
VIEWS
collection

eye 1.1B

Web Wide Crawl Number 13
Wide Crawl started August 2013
Wide Crawl started August 2013
collection
21,932
ITEMS
954.4M
VIEWS
collection

eye 954.4M

Web wide crawl with initial seedlist and crawler configuration from August 2013.
Wide Crawl started January 2012
Wide Crawl started January 2012
collection
30,373
ITEMS
824.9M
VIEWS
collection

eye 824.9M

Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Wide Crawl started April 2012
Wide Crawl started April 2012
collection
39,279
ITEMS
723.9M
VIEWS
collection

eye 723.9M

Web wide crawl with initial seedlist and crawler configuration from April 2012.
Wide Crawl started February 2014
Wide Crawl started February 2014
collection
9,806
ITEMS
616.2M
VIEWS
collection

eye 616.2M

Web wide crawl with initial seedlist and crawler configuration from February 2014.
Wide Crawl Started January 2013
Wide Crawl Started January 2013
collection
15,157
ITEMS
540.5M
VIEWS
collection

eye 540.5M

Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Host Screen Captures
Host Screen Captures
collection
17,458
ITEMS
223.7M
VIEWS
collection

eye 223.7M

Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
Wide Crawl started October 2010
Wide Crawl started October 2010
collection
15,839
ITEMS
554.2M
VIEWS
collection

eye 554.2M

Web wide crawl with initial seedlist and crawler configuration from October 2010
Wide Crawl started October 2011
Wide Crawl started October 2011
collection
12,648
ITEMS
514.7M
VIEWS
collection

eye 514.7M

Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Wide Crawl started September 2012
Wide Crawl started September 2012
collection
22,423
ITEMS
536.4M
VIEWS
collection

eye 536.4M

Web wide crawl with initial seedlist and crawler configuration from September 2012.
Wide Crawl started March 2011
Wide Crawl started March 2011
collection
8,528
ITEMS
471.7M
VIEWS
collection

eye 471.7M

Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
Wide Crawl Number 18
Wide Crawl Number 18
collection
7,442
ITEMS
53.9M
VIEWS
collection

eye 53.9M

Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Sun Oct 16 02:25:38 PDT 2016 to Sat Oct 15 21:19:02 PDT 2016.
Topic: crawldata
Wide Crawl started September 2010
Wide Crawl started September 2010
collection
332
ITEMS
16.6M
VIEWS
collection

eye 16.6M

Web wide crawl with initial seedlist and crawler configuration from September 2010
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 6M

favorite 1

comment 5

Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon May 13 17:55:38 PDT 2019 to Mon May 13 13:45:55 PDT 2019.
favoritefavoritefavoritefavoritefavorite ( 5 reviews )
Topic: crawldata
Wide Crawl started February 2014
web

eye 8.7M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web

eye 8.5M

favorite 0

comment 1

Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Wide Crawl started February 2014
web

eye 8.6M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web

eye 8.6M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web

eye 8.6M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Wide Crawl started April 2012
web

eye 110,546

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Wed Jul 11 02:18:27 PDT 2012 to Tue Jul 10 20:56:49 PDT 2012.
Topic: crawldata
Wide Crawl started April 2012
web

eye 108,974

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Wed Jul 11 03:18:20 PDT 2012 to Tue Jul 10 22:00:25 PDT 2012.
Topic: crawldata
Host Screen Captures
web

eye 84,398

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl431.us.archive.org:widewebcap from Fri May 18 01:26:09 PDT 2012 to Thu May 17 19:17:10 PDT 2012.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 1.2M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Tue Oct 9 21:09:32 PDT 2018 to Tue Oct 9 18:23:47 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl825.us.archive.org:wide from Thu Apr 7 11:34:12 PDT 2016 to Thu Apr 7 06:58:33 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Apr 7 15:00:19 PDT 2016 to Thu Apr 7 09:30:57 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Thu Apr 7 12:24:26 PDT 2016 to Thu Apr 7 08:51:27 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Thu Apr 7 17:22:05 PDT 2016 to Thu Apr 7 13:08:49 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Apr 7 12:21:53 PDT 2016 to Thu Apr 7 07:27:22 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Thu Apr 7 12:21:12 PDT 2016 to Thu Apr 7 07:19:38 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Thu Apr 7 12:12:12 PDT 2016 to Thu Apr 7 07:26:33 PDT 2016.
Topic: crawldata
Host Screen Captures
web

eye 427,664

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl831.us.archive.org:widewebcap from Wed Feb 5 22:06:36 PST 2020 to Fri Feb 7 04:47:16 PST 2020.
Topic: crawldata
Wide Crawl started February 2014
web

eye 2M

favorite 1

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Tue Feb 18 06:45:58 PST 2014 to Tue Feb 18 01:16:03 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Jun 29 15:44:55 PDT 2016 to Wed Jun 29 10:43:30 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Wed Jun 29 17:31:53 PDT 2016 to Wed Jun 29 15:11:43 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl835.us.archive.org:wide from Thu Jun 30 10:52:08 PDT 2016 to Thu Jun 30 06:07:26 PDT 2016.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 632,569

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Wed Aug 22 20:13:22 PDT 2018 to Wed Aug 22 15:42:40 PDT 2018.
Topic: crawldata
survey_00010
web

eye 476,495

favorite 0

comment 0

"Internet Archive crawldata from feed-driven by 1.2 million top ranked domains from data.domainrank.io - captured by crawl421.us.archive.org:survey_00010 from Sun May 24 19:26:13 PDT 2020 to Sun May 24 15:10:12 PDT 2020."
Topics: survey_00010, crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 35,767

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Fri Jun 21 07:00:18 PDT 2019 to Fri Jun 21 12:14:54 PDT 2019.
Topic: crawldata
Host Screen Captures
web

eye 1.7M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl431.us.archive.org:widewebcap from Tue Jan 17 22:09:27 UTC 2012 to Wed Feb 1 11:19:58 UTC 2012.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Fri Oct 21 18:47:16 PDT 2016 to Fri Oct 21 13:12:51 PDT 2016.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 714,803

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Sat Sep 1 05:24:22 PDT 2018 to Fri Aug 31 23:58:45 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 707,640

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Sat Aug 18 11:17:19 PDT 2018 to Sat Aug 18 07:39:16 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl815.us.archive.org:wide from Thu Mar 24 13:02:27 PDT 2016 to Thu Mar 24 07:48:54 PDT 2016.
Topic: crawldata
Host Screen Captures
web

eye 1.5M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl431.us.archive.org:widewebcap from Wed Jan 18 15:27:56 UTC 2012 to Wed Feb 1 15:16:33 UTC 2012.
Topic: crawldata
Wide Crawl started September 2012
web

eye 56,863

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Wed Nov 14 21:57:57 PST 2012 to Wed Nov 14 18:59:04 PST 2012.
Topic: crawldata
Wide Crawl started September 2012
web

eye 51,038

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Wed Nov 14 20:18:09 PST 2012 to Wed Nov 14 14:43:28 PST 2012.
Topic: crawldata
Host Screen Captures
web

eye 1.2M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl834.us.archive.org:widewebcap from Sat Mar 31 19:57:25 PDT 2018 to Wed Dec 5 22:39:00 PST 2018.
Topic: crawldata
Host Screen Captures
web

eye 720,801

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl834.us.archive.org:widewebcap from Fri Oct 21 01:30:25 PDT 2016 to Mon Nov 14 13:38:53 PST 2016.
Topic: crawldata
Wide Crawl started October 2011
web

eye 579,329

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Thu Dec 1 10:17:20 PST 2011 to Thu Dec 1 04:31:54 PST 2011.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 772,863

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Sun Sep 30 07:26:37 PDT 2018 to Sun Sep 30 03:22:20 PDT 2018.
Topic: crawldata
Host Screen Captures
web

eye 1M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl431.us.archive.org:widewebcap from Fri Apr 20 00:57:07 PDT 2012 to Thu Apr 19 20:54:24 PDT 2012.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Mon Aug 6 09:28:15 PDT 2018 to Wed Aug 8 18:02:43 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.5M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Sun Aug 5 00:35:40 PDT 2018 to Tue Aug 7 01:25:09 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.3M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Sun Aug 5 01:06:54 PDT 2018 to Mon Aug 6 15:07:05 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Sun Aug 5 05:35:10 PDT 2018 to Mon Aug 6 17:04:32 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Sun Aug 5 05:36:11 PDT 2018 to Mon Aug 6 06:38:33 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Mon Aug 6 12:03:20 PDT 2018 to Wed Aug 8 23:51:48 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 1

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl808.us.archive.org:wide from Sat Aug 4 22:37:55 PDT 2018 to Mon Aug 6 22:17:37 PDT 2018.
Topic: crawldata
Wide Crawl started February 2014
web

eye 5.2M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Wed Feb 19 08:18:23 PST 2014 to Wed Feb 19 04:21:37 PST 2014.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Sun Aug 5 03:33:48 PDT 2018 to Tue Aug 7 01:48:53 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Sun Aug 5 04:29:12 PDT 2018 to Mon Aug 6 09:39:30 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.3M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl807.us.archive.org:wide from Sun Aug 5 04:40:59 PDT 2018 to Tue Aug 7 09:40:31 PDT 2018.
Topic: crawldata
Wide Crawl started February 2014
web

eye 5.1M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Tue Feb 18 22:58:46 PST 2014 to Tue Feb 18 19:25:19 PST 2014.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun Aug 5 05:32:56 PDT 2018 to Tue Aug 7 09:24:53 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Mon Aug 6 13:38:33 PDT 2018 to Wed Aug 8 08:00:38 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.3M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl804.us.archive.org:wide from Sun Aug 5 04:04:57 PDT 2018 to Tue Aug 7 13:12:57 PDT 2018.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 2.4M

favorite 1

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Sun Aug 5 03:16:34 PDT 2018 to Mon Aug 6 16:55:41 PDT 2018.
Topic: crawldata